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Abstract 

The development of culture-independent techniques and next-generation sequencing has led to a 
staggering rise in the number of microbiome studies over the last decade. Although it remains 
important to identify the taxa of microbes present in a variety of environmental samples, including the 
gut microbiomes of healthy and diseased individuals, the next stage of microbiome research will need 
to focus on uncovering the role of the microbiome rather than its mere composition. Here, we 
introduce techniques that go beyond identifying the taxa present within a sample and examine the 
biological function of the microbiome or the host-microbiome interaction. 



Introduction 

Over the past 10 years, there has been a dramatic rise in 
the number of microbiome studies [1]. Two key 
developments, which have led to the recent explosion 
of interest in the microbiome and studies cataloging and 
defining the nature of microbial communities in a variety 
of biological systems, are the introduction of culture- 
independent analysis techniques and the development of 
next-generation sequencing along with advances in the 
bioinformatics support needed to facilitate this analysis. 
This has allowed researchers to circumvent the need to 
culture bacteria for identification. This is particularly 
beneficial given the difficulty of culturing of both 
obligatory anaerobic bacteria and other bacteria with 
unique or as-yet-undefined growth requirements. In 
addition, it is beneficial to study the microbiome as a 
whole as many organisms are co-dependent on each 
other within a niche. For example, many microbes take 
advantage of the metabolic abilities of other microbes 
within a community that break down compounds that 
they cannot digest by themselves, or they remove 
metabolic by-products of synergistic bacteria, allowing 



more efficient use of dietary substrates [2,3]. Studying 
any given microbe in isolation will undoubtedly provide 
a wealth of information regarding its functional capacity 
but will not necessarily be reflective of its role in the larger 
complex microbiome from which it was isolated. 

The culture-independent technique that has become the 
"gold standard" for microbial profiling is 16S ribosomal 
RNA (rRNA) gene sequencing. The use of the 1 6S rRNA 
gene for identifying bacterial taxa was pioneered by Carl 
Woese and others in the 1980s when it was shown that 
phylogenetic relationships of bacteria could be deter- 
mined by comparing a stable portion of the genome, 
with the 16S rRNA gene being one of several possible 
marker genes found in all bacteria and archaea [4,5]. By 
using universal primers to constant regions of the 16S 
rRNA gene, one can amplify and sequence various 
hypervariable regions within this gene. These hypervari- 
able regions have a high degree of interspecies variability 
that can be compared with known sequences in reference 
databases for taxonomic identification [6]. Although the 
idea is simplistic and elegant, there are various decisions 
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that must be made during sample preparation and data 
analysis that can greatly influence the conclusions. For 
information regarding these technical decisions and 
how they influence the analysis as well as a detailed 
description of 16S rRNA sequencing and its caveats, 
please see [7,8]. 

In addition to these technical challenges (considera- 
tions), the selection of sample type will greatly affect the 
conclusions of your sequencing analysis. For example, 
studies of the gut microbiome typically rely on the 
sampling of stool. This is primarily due to sample 
quantity and the non-invasive nature of stool collection, 
whereas endoscopic biopsies are much more invasive 
and limited in quantity but may provide a more accurate 
picture of those microbes directly in contact with, or 
more likely to influence, the host. Studies have shown 
that there is a great deal of difference between the stool 
microbiome and that of the mucosa-associated micro- 
biome obtained from biopsy samples [9,10]. Even 
within biopsy samples, absolute numbers and the 
diversity of the microbes present vary along the length 
of the gastrointestinal tract [9,10]. 

While DNA sequencing has been available since the 
1970s, this traditional sequencing, such as Sanger 
sequencing, was very costly, had a low throughput, and 
took a considerable amount of time to perform. The 
invention of newer massively paralleled sequencing 
technology, commonly referred to as next-generation 
sequencing, has led to a remarkable increase in the 
number of microbiome studies. Although next- 
generation sequencing techniques generally offer 
shorter contiguous DNA sequencing reads 
compared with Sanger sequencing, they do allow 
millions of sequencing reactions to occur in parallel 
within a single sequencing run [11]. This has 
significantly driven down the cost of sequencing and 
will likely continue to do so as this technology 
advances. The impact that 16S rRNA gene sequencing 
paired with next-generation sequencing has had on the 
field of the microbiome is evident from the surge in the 
number of references obtained from a simple PubMed 
search. In 2001, there were 74 articles that cited 
"microbiome", whereas in 2013 there were 3,254. 

Most of the studies undertaken so far have served to 
catalog the microbial species present in an environmental 
sample (for example, in the gut microbiome of healthy 
individuals) [12,13]. A number of studies have focused 
on examining microbial profiles associated with partic- 
ular disease pathology. This type of cataloging study has 
been done extensively in subjects with inflammatory 
bowel disease (IBD) [14-19], in which there is strong 
evidence to suggest that the microbiome is a major 



contributing factor to disease development [20-25]. 
Although all of this information has contributed 
significantly to our understanding of the composition 
of the human gut microbiome, additional work and 
alternative analyses are needed to understand how a 
given microbial community functions and interacts with 
host physiology, and influences health and disease. 

When one looks at these cataloging studies as a whole, it is 
evident that there does not appear to be a "core 
microbiota", meaning that there is not a species or group 
of species that appears to be common to everyone. It is 
possible that different species or communities as a whole 
have functional redundancies. If one considers the micro- 
biome as a complex organism, it is important to examine 
the functional capacity of that microbiome as it may reflect 
"core" functions and differentiate healthy from disease- 
related microbiomes. This suggests that understanding the 
function of the microbiome may be more important 
than simply cataloging the individual components 
(i.e. organisms). The next phase in microbiome studies 
will undoubtedly focus on biological function, extending 
beyond descriptions of the organisms that are present, to 
understand how a given microbiome population may 
function to affect the host and participate in disease 
processes. Here, we examine techniques that go beyond 
identifying the taxa present within a sample and examine 
the biological function of the microbiome or the host- 
microbiome interaction. 

Metagenomics by shotgun sequencing 

Metagenomics is the study of all genes contained within 
a community and theoretically allows assessment of the 
functional potential of a given microbiome, including 
bacterial, eukaryotic, and viral functions [26,27]. 
Although this is potentially a very powerful tool, it is 
important to keep in mind that identifying the presence 
of a gene with an assigned function is different from 
knowing whether the gene is actually expressed 
(transcribed and translated). 

It has been suggested that there is a core functional 
microbiome, meaning that although there is no evidence 
for a set of common species shared between all 
individuals, there nonetheless may be common genes or 
pathways detected in the microbiome of all individuals 
[28,29]. However, certain functional pathways may be 
enriched because they are required for existence rather than 
indicating a role for these pathways in host-microbiome 
interactions. For example, Lozupone et al. [30] showed 
that the core functions of the gut microbiome include 
metabolic pathways important for survival of the organ- 
isms living in the gut environment, such as carbohydrate 
and amino acid metabolism. 
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The most common way to perform metagenomics is 
through the application of shotgun sequencing. In shot- 
gun sequencing, extracted community DNA is sheared and 
sequenced. The resulting reads are blasted against various 
databases — such as the functional database: Kyoto Ency- 
clopedia of Genes and Genomes (KEGG) or the protein 
database: Clusters of Orthologous Groups (COG) — in 
order to obtain an individual pathway or gene assign- 
ments. This provides a cataloging of the gene sequences 
and their corresponding COG classifications that are 
present in the given sample. Depending on the level of 
resolution (the ability to assign reads to broad metabolic 
functions of very specialized functions), it can be difficult 
to extrapolate the results into biological relevance. For 
example, by presenting data by COG categories, sequen- 
cing reads may fall into the COG categories, such as "J", 
"T", or "Q", which encode for translation, signal transduc- 
tion, and secondary metabolite biosynthesis, transport, 
and catabolism, respectively [31]. Yet such functional 
categories are very broad, making it difficult to identify 
specific elements of a given pathway. 

The main downside of metagenomic analysis is that 
shotgun sequencing requires extensive sequencing reads 
(i.e. increased depth of sequencing) to ensure sufficient 
coverage of the entire metagenome [32]. This is important 
because variable functions that could be biologically 
important, such as butyrate or antibiotic production, 
which are usually restricted to select species or strains, may 
be difficult to detect if sequencing depth is insufficient 
[30]. In addition, a large proportion of the sequences 
generated are unassigned because they are not annotated 
or are not found in current databases. This reflects the 
incomplete collection of curated reference genomes 
available for comparisons. Currently, there does not 
appear to be a consensus in the literature regarding the 
depth of sequencing required to capture all of the 
functions of the microbiome. It has been suggested that 
the amount of sequencing data required to capture all of 
the bacterial functions of the human gut microbiome is 
approximately 7 gigabytes, but estimation is difficult to 
perform due to its dependency on bacterial genome size, 
the sequencing coverage of the bacterial genome, presence 
of plasmids, the complexity of the community, and error 
rate of the sequencing technology employed [33,34]. 

However, metagenomic sequencing has been used to 
identify relevant pathways that may be worth targeting 
therapeutically. For example, the microbiome of subjects 
with IBD were shown to have decreased amino acid 
biosynthesis and carbohydrate metabolism compared 
with healthy subjects. Specifically, subjects with ileal 
Crohn's disease (CD) were shown to have reduced vitamin 
biosynthesis and increased oxidative stress compared with 



ulcerative colitis (UC) or non-ileal CD [24]. These 
pathways could potentially be assessed in greater detail 
by using gene-targeting strategies or methods that rely on 
the direct cloning of DNA fragments extracted from 
uncultured microbial communities to identify and exploit 
novel therapeutic molecules [35,36]. The use of these 
more targeted approaches has already been initiated in the 
field of soil microbiology to attempt to identify new 
antibiotic-resistant genes or antimicrobials from uncul- 
tured microorganisms [36-39]. For example, in one study 
utilizing functional metagenomics, genes encoding anti- 
microbial agents were screened by using clone libraries 
constructed from DNA isolated from arid soil bacterial 
samples. After expression in Streptomyces albus, it was 
found that the recombinant bacteria showed inhibitory 
activity against methicillin-resistant Staphylococcus aureus 
as well as vancomycin-resistant Enterococcus faecalis [36]. 

These functional metagenomic approaches have only 
begun to be exploited to examine host-microbe interac- 
tions. In a two-part high-throughput screen, metage- 
nomic libraries generated from human fecal microbiota 
samples were cloned into bacterial cells. Lysates from 
these bacterial cell suspensions were subsequently added 
to eukaryotic cell lines, including a human colonic cell 
line, to identify genes that inhibit or enhance eukaryotic 
cell growth [40]. Screens like this could be adopted to test 
a number of host outputs and help define host-microbe 
interactions but also serve to direct drug development. 

Metagenomics inferred by imputing genetic 
functional capacity: PICRUSt 

Owing to the inherent complexity and expense of the 
substantial number of sequencing reads required for 
functional diversity profiling using a shotgun sequencing 
approach, researchers have developed software that imputes 
function based on 16S rRNA profiles. The Phylogenetic 
Investigation of Communities by Reconstruction of Unob- 
served States (PICRUSt) software uses genus or species 
identifiers assigned from 1 6S rRNA gene sequencing data, to 
infer function based on known full-reference genomes 
[41]. This software takes into account several factors 
important for metagenomic prediction, such as the avail- 
ability of pan and core genomes of microbioal reference 
taxa [42] and 16S rRNA copy number among bacterial taxa 
[8]. The software generates functional classifications based 
on KEGG [43] orthology and COG [44]. As such, the same 
caveats apply to these data as with shotgun sequencing. 
Outputs from KEGG and COG can often represent 
broad functions, making it difficult to narrow the output 
down to specific elements of a given pathway. 

The output from the PICRUSt software shows an average 
of 80% correlation with shotgun sequencing [41] within 
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microbial communities, such as the human gut micro- 
biome, where the number of fully sequenced genomes is 
greatest [45]. However, unlike shotgun sequencing, 
PICRUSt does not reflect strain diversity such as that 
which may be observed in numerous bacterial species, 
including pathobiont strains [46]. It also cannot yet 
impute viral or eukaryotic organism function. Further- 
more, results from this analysis are dependent on, and 
thus biased by, the hypervariable region or regions (i.e. 
V1-V9) of 16S rRNA sequenced, as taxonomic assign- 
ment is affected by the region selected for sequencing 
[47]. In summary, PICRUSt serves as a "poor man's 
metagenomics". Although it still has room to improve, it 
is a promising inexpensive tool that can be applied to 
any study with 1 6S rRNA data, and continuous efforts to 
sequence more gut microbial genomes will improve the 
accuracy of the software with time [41]. 

Metatranscriptomics 

Metatranscriptomics measures the total RNA present in a 
community of bacteria and, in a manner similar to 
metagenomics, provides a snapshot of the functional 
potential of a bacterial community as determined by the 
genes that have been activated and transcribed. The 
benefit of metatranscriptomics is that it has a greater 
likelihood of differentiating between genes that are 
expressed and active versus those which are not. In this 
way, metatranscriptomics provides not only a snapshot 
of the functional potential of a bacterial community but 
also a means of measuring the actual metabolic activity 
of that community. Initial research based on computa- 
tional analysis of the microbial genome suggested that 
prokaryotic transcription was a relatively simple process. 
However, more recent studies demonstrate that there is a 
significant amount of complexity, with previously 
unrecognized functional RNA elements, including non- 
coding RNAs and riboswitches, having a more dominant 
role than previously recognized. Furthermore, although 
it was previously believed that the structure and 
transcription of operons were fairly static, recent studies 
have demonstrated the presence of alternative operon 
structures with increased regulatory potential [48]. Such 
functional complexity highlights the added benefit of 
using non-DNA-based methods in community analysis. 

Metatranscriptomics involves extraction of the total RNA 
from a microbial community sample, conversion of total 
RNA, or enriched messenger RNA (mRNA) to comple- 
mentary DNA, combined with the use of either next- 
generation sequencing or microarrays to determine which 
portions of the genome are expressed. Despite the 
advantages and knowledge gained from studies of the 
transcriptome, technical difficulties in processing this 



genetic material often make this approach logistically 
more difficult. Enrichment of mRNA is often necessary due 
to the large amount (>75%) of rRNA and transfer RNA 
present in cells [49,50] and can be difficult as the 
polyadenylated tail used to isolate mRNA from eukaryotic 
organisms does not exist in bacterial cells. Furthermore, 
bacterial mRNA seems to be particularly unstable, with a 
very short half-life, demanding the immediate and 
efficient use of preservatives upon sampling or immediate 
preparation and extraction of RNA from samples [50-52]. 
Resulting reads must also be mapped against known 
bacterial genomes in order to identify the nature and 
function of a given gene, a major limitation given the low 
number of organisms with a fully sequenced genome. 
Despite these challenges, several recent studies have begun 
to offer insight into the functional role of microbial 
communities in human health. 

To date, metatranscript studies of the human gut micro- 
biome have been small in scale and demonstrate a large 
degree of inter-individual variability. However, similar to 
metagenomic observations, these studies have revealed 
relatively high levels of transcripts corresponding to basic 
cellular processes. Transcripts corresponding to "RNA 
polymerase", "ribosome", "pyruvate metabolism", and 
"glycolysis" among others are detected at relatively 
increased abundances across samples [53,54]. Interest- 
ingly, several studies have demonstrated alterations in 
transcriptional responses within the microbiome of 
individuals exposed to different food sources [55,56]. 
Indeed, a large component of the metatranscriptome in 
stool is related to carbohydrate metabolism [53]. Most 
importantly, recent studies have shown that alterations in 
gene expression and function can occur in response to 
dietary or probiotic intervention, independent of altera- 
tions in specific microbial community structure composi- 
tion, further highlighting the importance of such 
functional studies in understanding the true impact of 
environmental factors on the host microbiome [57]. 

Metabolomics 

Metabolomics is the study of complex biological samples, 
which aims to quantify and identify small molecules that 
are the by-products of metabolism. The term "metabolic 
profile" is often used to describe the collection of 
metabolites found within a biological sample, much like 
the term microbiota is used to encompass all of the 
bacteria present within a given environmental sample. 
Both genetics and environment greatly alter metabolism, 
but metabolomics allows for monitoring of the outcome. 
Like the host, the microbiome produces metabolites that 
give us an idea of its function and how it may interact with 
the host through host-microbe co-metabolism. 
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Metabolic profiles have already proven to be useful in 
distinguishing between healthy and diseased indivi- 
duals, thus providing biomarkers of disease. One 
example would be the use of metabolomics in IBD. 
Several studies have identified not only metabolites that 
distinguish healthy individuals from those with IBD [58- 
61] but also metabolites that distinguish between UC 
and CD [59,61]. The benefits of developing such 
biomarkers based on blood, urine, or stool are that 
these biomarkers provide a non-invasive tool for disease 
identification/phenotyping, mechanistic insight into 
disease processes, insight into drug metabolism, includ- 
ing drug toxicity, and possible insight into monitoring 
therapeutic interventions and may predict an indivi- 
dual's response to therapy. 

The "targeted" approach to metabolomics can measure a 
specific metabolite or class of metabolites (e.g. bile 
acids). An "untargeted" metabolomic approach provides 
a broad cataloging of many different classes of metabo- 
lites involving a broad range of metabolic pathways. This 
can be achieved by employing one or more of several 
analytical techniques available; however, it is important 
to note that no single analytical technique can be used to 
measure all types of metabolites. The most common 
techniques for metabolic profiling are mass spectrometry 
(MS) and nuclear magnetic resonance (NMR). Both of 
these techniques can be used to identify metabolic 
profiles in a variety of biological samples, including 
urine, serum, fecal extracts, and biopsy samples (e.g. 
biopsied colon tissue). MS is generally paired with an 
upstream separation technique, such as gas chromato- 
graphy (GC-MS) or liquid chromatography (LC-MS), 
and works by distinguishing molecules based on their 
mass-to-charge ratios and retention times. NMR, on the 
other hand, measures the absorption of electromagnetic 
radiation by different nuclei within a compound when it 
is placed under a magnetic field. Unlike MS, NMR is a 
non-destructive detection method that allows the sample 
to be recovered for further analysis. Additional benefits 
are that NMR often requires only small sample amounts, 
on the order of milligram or sub-milligram levels, and 
requires little to no sample preparation. For a more 
comprehensive review of metabolomics analytical tech- 
niques, please see Lindon et al. [62]. The data analysis 
also presents a major challenge when dealing with these 
analytical methods. Complex datasets are generated 
from these technologies and their interpretation requires 
the use of multivariate statistical analysis. 

Studies of germ-free versus conventional animals (those 
which harbor a traditional microbiome) have shown 
that the microbiome is responsible for producing a 



number of metabolites capable of interacting with the 
host [63,64]. Importantly, the microbiome is responsible 
for producing short-chain fatty acids which are utilized 
by the host (e.g. serving as an important energy source 
for colonic epithelial cells) [65,66]. 

A series of elegant studies by the Hazen lab [67-69] has 
shown us how the use of metabolomics has identified a 
role of the microbiome linked with diet in cardiovascular 
disease. The initial study by Wang et al. [67] set out to 
identify small molecules associated with individuals 
with cardiovascular disease. This untargeted blood screen 
found that the dietary metabolite phosphatidylcholine 
was converted by the gut microbiome to trimethylamine 
N-oxide (TMAO), which by a yet unknown mechanism 
contributes to atherosclerosis. Follow-up studies by this 
group utilizing both well-delineated human and animal 
studies show that the use of antibiotics suppressed the 
increase in plasma levels of TMAO when subjects were 
given a dietary phosphatidylcholine challenge [68] or L- 
carnitine challenge [69], providing further evidence that 
the production of TMAO is catalyzed by intestinal 
microbial metabolism. This is an excellent example of 
how the use of metabolomics can provide mechanistic 
insight into the role of the microbiome in disease 
pathogenesis and further generate new areas of study. 

Future of microbiome studies 

Although all of the -omics techniques discussed here are 
culture-independent, it continues to be important to 
study microbes in isolation to fully grasp their functional 
potential. Laboratory techniques have continued to 
progress, allowing more microbes than ever to be 
cultured [70]. By isolating these microbes, we can not 
only interrogate their function but genetically mani- 
pulate them by using insertion sequencing [71] or other 
techniques to determine which genes are vital to various 
processes (i.e. colonization of the gut, growth with 
certain dietary components, competing with various 
pathogens). 

Employing -omics-based approaches will provide an 
understanding of the role of the microbiome in host 
health and disease. However, to determine whether the 
microbiome is causal in disease pathogenesis, integra- 
tion of these techniques with in vitro and in vivo studies 
will be required. Perhaps some of the most useful in vitro 
techniques include bioreactor systems that mimic the 
conditions found in the gastrointestinal tract. This allows 
culturing of bacteria within a complex microbial com- 
munity and a complex environment that more closely 
resembles the gut. For a review of a variety of in vitro 
techniques available, refer to reference [72]. 
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Undoubtedly, the use of germ-free animals not only has 
been useful for determining the function of the micro- 
biome in immune development and host physiology [73] 
but has also proved to be a powerful tool to show causality 
of the microbiome in disease. By colonizing germ-free 
animals with defined or whole microbial communities, 
researchers are able to limit confounding variables, such as 
diet and environment, to isolate the contribution of the 
microbiome to pathologies such as obesity, diabetes, and 
cardiovascular disease to name a few [74,75]. More recent 
gnotobiotic studies (germ-free or formerly germ-free prior 
to introduction of microbes) have utilized mice; however, 
there has also been an increase in the use of germ-free 
zebrafish because of the ease of working with this model 
and its relatively inexpensive cost and an increase in the 
use of germ-free pigs, as the immune system of this animal 
more closely resembles that of humans. Each model has its 
own benefits and disadvantages. For a review of gnoto- 
biotic animal models in microbiome research, please see 
references [72,76]. 

Conclusions 

The development and application of new technologies 
involved in next-generation sequencing have led to an 
explosion of investigative work focused on the char- 
acterization of the microbiome in health and disease. 
This has certainly supported the pre-existing notion that 
the gut microbiome is important in several gastro- 
intestinal diseases and has suggested a role for microbes 
in this niche in promoting diseases affecting other 
organs, such as cardiovascular disease, diabetes, and 
cancer [77,78]. Although we are still learning how to 
more accurately identify which bacteria are present and 
have begun to expand this inventory to include viruses 
and fungi, we must also go beyond cataloging and 
examine the functional role that the microbiome plays 
in health and disease. This will require the use of well- 
designed experiments paired with appropriate techni- 
ques, such as metagenomics, metatranscriptomics, and 
metabolomics, evaluated by more sophisticated bioin- 
formatic tools to assess the biological importance and 
mechanisms of the microbiome and host-microbiome 
interactions. 
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